TEX to HTML and back
نویسندگان
چکیده
Both LTEX and HTML are languages that can express the structure of a document, and similarities between these two systems are shown. A detailed study is made of the LaTeX2HTML program, written by Nikos Drakos, that is today the most complete utility for translating LTEX code into HTML, providing a quasi-automatic translation for most elements. A discussion of a few other tools for translating between HTML and LTEX concludes the article. 1 Similarities between LTEX and HTML HTML and LTEX are both generic markup systems, and a comparison between tags for structural elements in both cases is shown in Table 1. In most cases the differences are trivial, seeming to indicate that, at first approximation, translating between these two systems should not prove too difficult. The translation programs described in this article use these similarities, but in order to exploit the richness of the LTEX language as compared to HTML (especially HTML2, which has no support for tables or mathematics), an ad hoc approach has to be adopted. To handle correctly LTEX commands that have no equivalent in HTML, such elements can either be transformed into bitmap or PostScript pictures (an approach taken by LaTeX2HTML), or the user can specify how the given element should be handled in the target language. 2 Converting LTEX into HTML Before discussing the LaTeX2HTML program, we want to mention a few other programs. First there is l2x, written by Henning Schulzrinne (Berlin, Germany), which translates LTEX into various other formats. This program is written in C and calls a Tcl function (Ousterhout, 1994) for each LTEX command. A converter html.tcl is available for translating LTEX files into HTML, by writing, for instance: l2x -p html.tcl article.tex Presently, only a sub-set of all LTEX commands are handled (no mathematical formulae, tables, verbatim texts, etc.), yet it is not too difficult to augment the code of the converter html.tcl by introducing new Tcl commands. 1 See the URL http://info.cern.ch/hypertext/WWW/ Tools/l2x.html. Schwarzkopf (Schwarzkopf, 1995) has developed Hyperlatex, a package written in the GNU Emacs lisp language to translate documents marked up in (a subset of) LTEX into HTML. Another interesting tool is tex2RTF, a utility to convert from LTEX to four other formats, including HTML. It does a relatively good job for a sub-set of LTEX commands, but, as with the Tcl approach of l2x, it cannot handle more complex structures, such as mathematical expressions and tables. Finally, although not directly relevant to LTEX, texihtml translates texinfo sources into HTML. 3 The LaTeX2HTML Converter—Generalities LaTeX2HTML is a program written in the perl programming language (Schwartz, 1993; Till, 1995; Wall and Schwartz, 1991) by Nikos Drakos.. It transforms a LTEX document into a series of HTML files linked in a way that reflects the structure of the original document. 3.1 What LaTeX2HTML is and What it is Not LaTeX2HTML is a conversion tool that allows documents written in LTEX to become part of the World Wide Web. In addition, it offers an easy migration path towards authoring complex hypermedia documents using familiar word-processing concepts. LaTeX2HTML replicates the basic structure of a LTEX document as a set of interconnected HTML files which can be explored using automatically generated navigation panels. The cross-references, citations, footnotes, the table of contents and the lists of figures and tables are also translated into hypertext links. Formatting information which has equivalent “tags” in HTML (lists, quotes, paragraph breaks, type styles, etc.) is also converted appropriately. The remaining heavily formatted items such as mathematical equations, pictures or tables are 2 The documentation is available at the URL http://hobak.postech.ac.kr/~otfried/html/ hyperlatex.html. Otfried Schwarzkopf can be reached via email at [email protected]. 3 Written by Julian Smart (Edinburgh, Britain). For more information see the URL http://www.aiai.ed.ac.uk/ ~jacs/tex2rtf.html. 4 Written in perl by Lionel Cons (CERN, Geneva). For more information see the URL http://asis01.cern.ch/ infohtml/texi2html.html. 5 texinfo is a TEX based markup language used for all gnu project related documentation. 6 More information can be found in the UF/NA perl archive at the URL http://www.cis.ufl.edu/perl/. 7 The documentation is at the URL http://cbl.leeds. ac.uk/nikos/tex2html/doc/latex2html/latex2html.html. One can also join the LaTeX2HTML mailing list by sending a message to [email protected] with as only contents line: subscribe. TUGboat, Volume 16 (1995), No. 2 175 Description HTML LTEX Sectioning commands level 1 \chapter or \section level 2 \section or \subsection level 3 \subsection or \subsubsection level 4 \subsubsection or \paragraph level 5 \paragraph or \subparagraph level 6 \subparagraph new paragraph \par Lists numbered list \begin{enumerate} unnumbered list \begin{itemize} list element \item description list \begin{description} term \item definition text Highlighting text emphasis text \emph{text} italic text \textit{text} bold text \textbf{text} fixed with text \texttt{text} Table 1: Comparison of structural elements in HTML and LTEX converted to images placed automatically at the correct positions in the final HTML document. LaTeX2HTML extends LTEX by supporting arbitrary hypertext links and symbolic cross-references between evolving remote documents. It also allows the specification of conditional text and the inclusion of raw HTML commands. These hypermedia extensions to LTEX are available as new commands and environments from within a LTEX document.
منابع مشابه
HTML & TEX : Making them sweat
HTML is often criticised for its presentation-oriented conception. But it does contain sufficient structural information for many everyday purposes and this has led to its development into a more stable form. Future platforms for the World Wide Web may support other applications of SGML, and the present climate of popularity of the Web is a suitable opportunity for consolidation of the more sta...
متن کاملActive TEX and the DOT Input Syntax
The usual category codes give TEX its familiar backslash and braces input syntax. With Active TEX, all characters are active. This gives the macro programmer complete freedom in defining the input syntax. It also provides a powerful programming environment. The DOT input syntax, like TROFF, uses a period at the start of the line as an escape character. However, its underlying element, attribute...
متن کاملTEX to HTML Translation via Tagged DVI
This paper describes dvihtml, a program under development for translating a tagged DVI file into HTML. A common problem when translating TEX into another format is handling unexpected macros. Fortunately, TEX’s macro language is flexible enough to pass markup information to the DVI file in the form of \special’s, fonts and small horizontal or vertical movements. Translating the resulting DVI fi...
متن کاملFrom L A TEX to HTML and back Michel
Both LATEX and HTML are languages that can express the structure of a document, and similarities between these two systems are shown. A detailed study is made of the LaTeX2HTML program, written by Nikos Drakos, that is today the most complete utility for translating LATEX code into HTML, providing a quasi-automatic translation for most elements. A discussion of a few other tools for translating...
متن کاملManaging TEX Resources with XML Topic Maps
For many years the Polish TEX Users Group newsletter has been published online on the GUST web site. The repository now contains valuable information on TEX, METAFONT, electronic document, computer graphics and related subjects. However, access to the content is very poor: it is available as PS/PDF files with only a simple HTML page facilitating navigation. There is no integration with informat...
متن کاملFrom SGML to HTML with help from TEX
At this time there is still no fast and standard way of presenting mathematics in HTML pages. Various ideas have been tested and the W3C has just released a draft math markup proposal. When combined with freely available fonts containing the required mathematical characters, we can see much potential for the future. For now it seems that there is only one common denominator suitable for quickly...
متن کامل